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Abstract 

Gradient-enhanced Uncertainty Quantification (UQ) has received recent at¬ 
tention, in which the derivatives of a Quantity of Interest (Qol) with respect 
to the uncertain parameters are utilized to improve the surrogate approxima¬ 
tion. Polynomial chaos expansions (PCEs) are often employed in UQ, and 
when the Qol can be represented by a sparse PCE, ri-minimization can iden¬ 
tify the PCE coefficients with a relatively small number of samples. In this 
work, we investigate a gradient-enhanced ^i-minimization, where derivative 
information is computed to accelerate the identification of the PCE coeffi¬ 
cients. For this approach, stability and convergence analysis are lacking, and 
thus we address these here with a probabilistic result. In particular, with an 
appropriate normalization, we show the inclusion of derivative information 
will almost-surely lead to improved conditions, e.g. related to the null-space 
and coherence of the measurement matrix, for a successful solution recov¬ 
ery. Further, we demonstrate our analysis empirically via three numerical 
examples: a manufactured PCE, an elliptic partial differential equation with 
random inputs, and a plane Poiseuille flow with random boundaries. These 
examples all suggest that including derivative information admits solution 
recovery at reduced computational cost. 
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1. Introduction 


In complex engineering system analysis, inherent variability in inputs 
and imperfect knowledge of the physics can lead to unfounded conhdence or 
unnecessary diffidence in understanding Quantities of Interest (Qol). Uncer¬ 
tainty Quantihcation (UQ) [1, 2, 3] as a study aims to develop numerical tools 
that can accurately predict Qol and facilitate the quantitative validation of 
the simulation model. 

To characterize uncertainty, probability is a natural framework. We 
model the uncertain inputs as a d—dimensional vector of independent ran¬ 
dom variables H := (Si,..., S^), with probability density function p(H). The 
Qol that we seek to approximate is denoted by a scalar m(H). Here we utilize 
polynomial chaos expansions (PCEs) [1, 4] to approximate m(H), assumed to 
have hnite variance. In this case, 'u(3) can be represented as an expansion 
in multivariate orthogonal polynomials '0j(3), i.e., 

OO P 

«(3) = ^ + et(H), (1) 

i=i i=i 

where Cj,j = 1, 2,..., are the corresponding PCE coefficients, and e* is the 
truncation error associated with retaining P terms of a sorted basis. The 
PCE coefficients can be computed by the projection 

Cj = j = E [u{S)^/Jj{S )], (2) 

where the operator E denotes the mathematical expectation. Here we assume 
that '0j(S) are normalized such that E = 1. 

Typically, a full P term approximation is not necessary, and we can re¬ 
strict to an unknown subset C C {!,..., P} such that |C|, the number of 
elements in C, is signihcantly smaller than P. In addition, \C\ is referred to 
as the sparsity of the approximation. We approximate 'u(H) in (1) then by 

w(3) (3) 

pc 

This concept of using one basis, indexed by {!,..., P}, to compute an ap¬ 
proximation in a small but unknown subset of those basis functions, indexed 
by C, falls within the context of compressive sampling [5, 6, 7, 8, 9]. 
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To identify PCE coefficients, we consider non-intrusive sampling meth¬ 
ods, in which deterministic solvers for the Qol are not modihed. Such 
methods include Monte Carlo simulation [10, 2], pseudo-spectral stochas¬ 
tic collocation [11, 12, 2, 13], least squares regression [14, 15, 16], and i\- 
minimization [17, 7,18,19, 8, 9, 20, 21], In this work, we adopt £i-minimization 
to estimate the coefficients, solving the problem 


argmin||c||i subject to ||it —^'c ||2 < (4) 

C 


where the vector c := (ci,..., cp) contains PCE coefficients, while the vector 
u and the so-called measurement matrix ^ contains function evaluations at 
realizations of the random input, H. Specihcally, denoting the zth realization 
of H as 



(5) 

«'(». j) := V'i (y>) . 

(6) 


In (4), (5 is a tolerance parameter necessitated by the truncation error, ei(H), 
to reduce the effect of overhtting. For this work, 5 is identihed via cross- 
validation [22]. 

It has been shown that as the number of samples, iV, increases, the prob¬ 
ability of accurately recovering PCE coefficients experiences a corresponding 
increase [9]. Often the deterministic solver of u is computationally demand¬ 
ing, and it is expensive to compute realized order to deal with 

this situation, coefficient estimation based on gradient-enhanced PCE has 
received recent attention [23, 24, 25, 26, 27, 21], Specihcally, the gradient 
information utilized is 



and 


-l)-d + k,j) 




k 


1,...,d. 


( 8 ) 


With this gradient-enhancement, c is solved by minimizing ||c|| 2 , in least 
squares regression, or ||c||i, in compressive sampling, subject to a constraint 
on 





Wc 


(9) 
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where W is a positive-diagonal matrix depending on the basis functions 
and their derivatives, which we shall specify in Section 3.2. In this way, N 
realizations of the random inputs H provide an -f-1) x P matrix, where 
the gradients du/dEk, k = l,...,(i, may be computed at each evaluation 
of H from, e.g., direct or adjoint sensitivity equations [28], or automatic 
differentiation [29]. We assume that m(H) and its partial derivatives are 
square integrable with respect to the measure for H, and are represented in 
the appropriate basis such that the Qol and its derivatives correspond to 
identical coefficients. 

In a related context, a gradient-enhanced sparse approximation based 
on f'l-minimization was proposed in [30], in which derivatives du/dEk, k = 

1.. .. ,d, are projected onto Legendre PCE, with a corresponding coefficient 
vector cq. 

1.1. Contribution of this work 

In this work, we investigate a gradient-enhanced £i-minimization ap¬ 
proach as seen in [21], in which the derivative information is empirically 
shown to improve the resolution of PCE coefficients. 

In Section 3, a theoretical contribution concerning the Restricted Isome¬ 
try Constant (RIC) is presented regarding recovery in f'l-minimization with 
derivative information based on Hermite PCE. The RIC is a constant as¬ 
sociated with the measurement matrix that provides fruitful (probabilistic) 
bounds for recovery of solutions via f'l-minimization [31]. Additionally, un¬ 
der an appropriate normalization introduced by W in (9), it is guaranteed 
that null-space, measurement matrix column inner-products, and coherence 
related measures are almost-surely improved, implying that stability for so¬ 
lutions computed by (4) are not reduced by including derivative information. 
These analyses are original to the authors’ best knowledge, and provide a 
framework for analysis of recovery in other PCE bases. Also, though not 
considered here, the approach may be extended to least squares polynomial 
chaos regression [16]. 

Three numerical experiments are used to demonstrate the gradient-enhanced 
£i-minimization in Section 4, among which a plane Poiseuille flow with ran¬ 
dom boundaries is simulated, and the derivative information is approximated 
via an adjoint sensitivity method. The empirical results agree with the theo¬ 
retical analysis, and suggest that the inclusion of derivative information can 
improve solution recovery at a lower overall computational cost. 
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The structure of the manuscript is as follows. In Section 2, we state 
our problem and introduce the formulation of the gradient-enhanced 
minimization approach. In Section 3, we present theoretical results concern¬ 
ing the stability and convergence of the gradient-enhanced £i-minimization 
approach, in the context of Hermite PCE. In Section 4, we demonstrate our 
analysis empirically via three numerical examples: a manufactured PCE, 
an elliptic partial differential equation with random inputs, and a plane 
Poiseuille flow with random boundaries. Section 5 presents the proofs to 
the results in Section 3. 

2. Method Synopsis 

2.1. Problem statement 

We use differential equations to model engineering systems on a domain 
P> G D G {1,2,3}, in which the uncertainty sources characterized by 
H may be represented in one or many relevant parameters, e.g., boundary 
conditions and/or initial conditions. The solution u is governed by the equa¬ 
tions 

£(a;, f, H; u(a;, f, H)) = 0, xeV, 

X(a;, H; ^(a:;, 0, H)) = 0, x eV, (10) 

t, S; M(a;, f, H)) = 0, xEdV, 

where £, X, and B are differential operators depending on the physics of the 
problem, the initial conditions, and the boundary conditions, respectively. 
Our objective is to approximate the Qol, here M(a;o,to,S), for some hxed 
spatial location Xq and time to- Since we denote the realizations of the 
random inputs by the corresponding output is M(a;o, To reduce 

notation, we drop the reference to Xq and to, and simply write m(3) and 

2.2. Polynomial chaos expansion (PCE) 

We rely on the PCE (1) to approximate the Qol, m(H). For convenience, 
we assume that the input random variables, are independent and identi¬ 
cally distributed according to the probability density function and dehne 
{'0jj.(Sfc)} to be the complete set of polynomials of degree 4 £ IdU{0} orthog¬ 
onal with respect to the weight function pk [32, 4]. Hence, the multivariate 
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orthonormal polynomials in H are given by the products of the univariate 
orthonormal polynomials, 

d 

= Y['^iki-k), ( 11 ) 

k=l 


where each i E {{ii,... ,id) ■ ik ^ ^ ^ {0}} is a d-dimensional multi-index of 
non-negative integers. For computation, we truncate the expansion in (1) to 
the set of P basis functions associated with the subspace of polynomials of 
total order not greater than p, that is Ylk=i — P- convenience, we also 
order these P basis functions so that they are indexed by P}, as in 

(1), where there should be no confusion in using either notation. The basis 
set has cardinality 


{d + p)\ 
d\p\ 


( 12 ) 


2.3. ii-minimization with gradient information 

We generate derivative information for the Qol, denoted by ug, and corre¬ 
spondingly evaluate the derivatives of 'ipjiS), j = 1,..., P, at the realizations 
i = 1,..., iV, stored in a matrix as in (7) and (8). For brevity, we 
dehne 


u = 



(13) 


and 



(14) 


where u G and ^ G is referred to as the gradient- 

enhanced measurement matrix. Gradient-enhanced £i-minimization solves 
the problem 


argmin||c||i subject to 

C 


u — ^Wc 


2 


<<5, 


(16) 


where 6 generally differs from the choice in (4) and is assumed to be 


normalized such that E 




= I, the P X P identity matrix. Here, 

W is a positive-diagonal matrix, whose dehnition is deferred until Section 3.2. 
We next begin a theoretical development to justify this approach. 
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3. Theoretical Discussion 

We present results supporting the premise that the inclusion of derivative 
information does not reduce the stability of solutions recovered via (15) when 
compared to solutions recovered via (4), i.e., in the absence of derivative in¬ 
formation. We refer to these solutions and the methods to attain them using 
the adjectives gradient-enhanced and standard, respectively. We perform 
our analysis here with Hermite polynomials as they possess the convenient 
property that they and their derivatives are orthogonal with respect to the 
same measure, as by (5.5.10) of [33]. While we consider the Probabilists’ 
polynomials here for exposition, the Physicists’ polynomials would produce 
analogous results. Though we do not consider the details here, this analysis 
may be extended to the case of Laguerre or Jacobi polynomials where the 
derivative polynomials form an orthogonal system with respect to a measure 
that differs from the orthogonality measure, yet is still explicitly known, as 
by (5.1.15) and (4.21.7) of [33], respectively. 

First, we motivate and summarize results for standard fi-minimization. 
Then we expand on these results in the case of gradient-enhanced £i-minimization. 
The path of this analysis flows through the Restricted Isometry Constant 
(RIC) [5], which is denoted by Js($) and is dehned to be the smallest num¬ 
ber satisfying 


(1-5.W)II?/Il2<ll^?/Il2<(l + 5.W)II?/Ili (16) 

Here, Js(^) yields a uniform bound on the spectral radius of the submatrices 
of $ formed by selecting any s columns of $. Often, the matrix being 
considered is clear from context, and we then shorten (5s($) to Related to 
the RIC are restricted isometry properties that occur when the RIC reaches 
a small enough threshold, and guarantee that fi-minimization with the given 
matrix is a stable computation. An example of such a restricted isometry 
property is given in Theorem 3.1 from [34]. This theorem shows that if 
S 2 s < 3/(4 -|- \/6), where s is dictated by the specihc problem, then a stable 
recovery is assured. 

Theorem 3.1. [34] Let c G represent a solution we seek to approximate, 
and let c be the solution to (4)- Let 

rj := — u, 
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denote the contribution from sources of error, and let e from (f), he chosen 
such that llr/lll < e. If 

62 s{^) < d, := 3/(4 + \/ 6 ) ^ 0.4652, 

then the following error estimates hold, 

||c — c ||2 < inf 

y s ||cs||o<s 

||c —c||i<C 3 inf \\Cs — c\\i + Cie^/s, 

||Cs||o<S 

where ci, 02 , 03 , and 04 depend only on 62 s, and || • ||o refers to the number of 
non-zero elements of the vector. 

We note that, related to the discussion in Section 3.1, e in Theorem 3.1 
may be selected so that IIT 7 II 2 < 0 holds with high probability. Unfortunately, 
identifying the RIC for a given matrix requires a computation for every sub¬ 
matrix of s columns, which is intractable in most situations of interest. As a 
practical alternative we instead choose to bound the RIC in a probabilistic 
sense, allowing us to identify a probability that the RIC is below a chosen 
threshold. In this way we can guarantee that a restricted isometry property, 
such as the one of Theorem 3.1, holds with a certain probability. To do so, 
we introduce a dehnition of coherence, hrst considering the standard case, 
before expanding its dehnition to the gradient-enhanced case. 

3.1. Standard ii-minimization analysis 

We consider here an approach that uses arguments similar to those in [35, 
9]. We note that those works did not proceed through the RIC as we do here. 
First, let Q be an arbitrary subset of the sample space for H, that is the values 
which H can take. Here, Q is used to truncate the domain to one on which 
the basis functions, here Hermite polynomials, can be uniformly bounded. 
The coherence parameter for the standard approach [35, 9] is dehned to be 

hS := sup (17) 

which for a precompact Q is guaranteed to be hnite. An example of a Q 
suitable for use with Hermite polynomials, and used in [9, 16], is 

2 : 11^112 < (4 + Op^d)p + 2 }, 
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— c||i + C2e; 


( 18 ) 


where is a positive constant, which may be arbitrarily small bnt close to 
zero in an asymptotic analysis of the behavior of Hermite polynomials. We 
note that this trnncation has not been analyzed when the nnmber of samples 
is exponentially greater than the nnmber of basis fnnctions; however, this is 
not an issne here where the nnmber of samples is typically less than or not 
snbstantially greater than the nnmber of basis fnnctions. The definition of 
coherence parameter as in (18) leads to the following theorem, taken from 
Theorem 4.1 of [9]. 

Theorem 3.2. For d-dimensional polynomials of order p > 1, the coherence 
in (17) is bounded by 

fiQ<Co- Cl (19) 

where Cq and Ci are modest constants depending on d, p, As p/d ^ oo, 

Cq decreases to 1, and Ci decreases to a limit o/exp(2 — log(2)) ^ 3.7. 

This shows an exponential dependence of the coherence parameter on p. 
Let Xk denote a row vector consisting only of basis polynomial evalnations 
at We bonnd the RIC for the matrix ^ defined as in (6). 

Theorem 3.3. For any chosen Q, we may bound the RIC in a probabilistic 
sense by 

P(ds <t)> P(Q)^ - exp \-Cq^^ + s + log(2s) + slog(P/s) 

V 

Remark 3.1. We note that we do not generally reguire arbitrarily small 6s, 
as is seen in Theorem 3.1. The constant Cq scales with 

eQ-.= \\E{X^X\^eQ)-l\\^, (20) 

which is a bias that is negligible in practical contexts for Q as in (18) [9]. 
Truncating in this way, neither Cq nor¥{Q) are problematic in practice. 

Remark 3.2. While our primary focus here is ii-recovery, the RIC cor¬ 
responding to s = P is useful for analyzing the stability of a least sguares 
solution, and so this result is also applicable to i 2 -minimization. For ways 
in which this parameter may bound error from solutions computed via 12 - 
minimization we point the interested reader to [36, 16]. In this case, a slight 
adjustment to the proof gives the bound 

P(dp < t) > P(Q)'^ - 2Pexp (-Cq^ 

V PRq 
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The following corollary, which follows from a rearrangement of the resnlt 
of Theorem 3.3, highlights the relationship between several qnantities. 

Corollary 1. To insure that 5s < 5-i, with probability p*, it is sufficient to 
take Ah satisfying 

[s + log(2s) + s log(P/s) - log - p*)] . 

For example, using 5* as in Theorem 3.1, gives a guarantee for stability 
of solutions to (4) when the number of samples N satishes 

N > [s + log(2s) + slog(P/s) - log (P(Q)^ - p*)] . 

We note that N appears on both the left and right sides of the equation. This 
arises from the truncation, which has a technical issue when N is exponen¬ 
tially larger than P; specihcally, the probability that at least one sample had 
fallen in becomes large, while the analysis relies on this event being rare. 
As N is very often smaller than P, and rarely chosen to be exponentially 
larger than P, this issue is not of practical concern. Next, we extend these 
results to the case where gradient information is included. 

3.2. Gradient-enhanced ii-minimization analysis 

In the previous case the rows of ^ were independent, while ^ has a 
more subtle independence structure, as only the sets of rows associated with 
the independent samples are independent. Related to this point, we let the 
(d -|- 1) X P block of independent information related with the kth sample be 
given by Xk, with a generic realization given by X. Specihcally, 

X{i,j) = ^{^), i = l,...,d; 

X(d + l,j)=^,(0- 

That is, the last row corresponds to the realizations of the function, while 
the hrst d rows correspond to the derivative information. We note that there 
is no effect in the computations considered here by rearranging the rows 
of the matrix The RIC will necessarily be larger if the norms of the 
matrix columns are different. Adjusting for this can be done by adjusting 
the basis functions themselves. In the case of standard £i-minimization. 
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we use orthonormal polynomials. Here, we will wish to include derivatives 
of those polynomials as well, requiring a different normalization. We use 
the Probabilists’ Hermite polynomials, and the following lemma is used to 
identify this normalization. 

Lemma 3.1. For d-dimensional orthonormal Probabilists’ Hermite polyno¬ 
mials 'ipi and 'ijjj with order ik^jk in dimension k, 



where Sij is the Kronecker Delta. 

This suggests a different normalization of the basis functions to enforce 
that columns of the gradient-enhanced measurement matrix have the same 
expected ^ 2 -iiorm. We refer to this normalization as gradient-normalization. 
Specihcally, we multiply the orthonormal basis function 'ipi by 

/ d \ -V2 

Wi := f 1 + ^4j , (22) 

to gradient-normalize. We note that it is these weights that dehne the W 
in (9) and (15). In this work, we assume that when derivative informa¬ 
tion is included, that those basis functions are gradient-normalized, and that 
when derivative information is not included, that those basis functions are 
orthonormal, which we refer to as standard-normalization. This insures that 
the expected norms for the columns of the sampled matrices are consistent 
in both cases. 

Recalling that N denotes the number of samples used, our analysis focuses 
on the Gramian matrix 


(23) 

' k=l 

Here, M = for the standard approach, while M = for 

the gradient-enhanced approach. We now present some summarized results 
for the standard approach that will be compared to the results presented 
for the gradient-enhanced case. To analyze the spectrum of M in the case 
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of gradient-enhanced f'l-minimization, we use the following dehnition, which 
generalizes the f'l-coherence as studied in [35, 9, 34] and dehned in (17). Let 
Q be an arbitrary subset of hi, which we use to truncate the sample space to 
insure a uniformly bounded polynomial system, e.g. as in [35, 9, 16], and let 

(3q := snp \\X{:,k)\\l. (24) 

This parameter is a generalization of iiq in the case where rows are not inde¬ 
pendent, but sets of rows are. Note that in the case that X is a row vector, 
then this dehnition reduces to (17). Specihcally, the results of Section 3.1 all 
hold when substituted for this parameter, which we highlight as a theorem. 

Theorem 3.4. The theorems of Section 3.1 hold for the gradient-enhanced 
case when fiQ is replaced by (3 q. 

We conclude our analysis with three results which demonstrate that 
the inclusion of derivative information, coupled with gradient-normalization, 
does not reduce stability over the corresponding approach without derivative 
information. 

The hrst result is an inequality concerning the coherence parameter de¬ 
hned in (17) and (24), showing that including derivative information does 
not require any weakening of the bounds of Section 3.1 in the case without 
derivative information. This inequality then directly applies to all of the 
theorems in Section 3.1. 

The second result is a direct null-space comparison of the two diherent 
matrices, which is known to be fundamental for recovery of exactly-sparse 
solutions [5, 37, 38]. Specihcally, as H’l'c — fill < 5 (or HT'c — it|| < 5) is 
enforced, the diherence between potential solutions is close to an element 
of the null-space of ^ (or T'), so reducing the dimension of the null-space 
correspondingly reduces the space of potential solutions. 

The third result concerns a bound on the inner-products of columns of 
the measurement-matrix. This is related to the RIC in that if the inner- 
product between several pairs of columns is of large absolute value then a 
linear combination of those columns will have small norm, resulting in a larger 
RIC. Similarly, if those inner-products are of small absolute value, then no 
linear combination will have a small norm. An analogous observation may 
be made regarding a linear combination of columns having a much larger 
norm. For this reason it is benehcial if the inner-product between columns 
is of small absolute value. 
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Theorem 3.5. Let ^ be a realized measurement matrix with derivative in¬ 
formation that is gradient-normalized. Similarly, let ^ be a realized mea¬ 
surement matrix with standard-normalization and no derivative information. 

Assume that and are formed from the same realized input samples, 
SO that up to row weighting, ^ is a sub-matrix of Then the 
following statements related to the recovery of solutions via ii-minimization 
hold. 

Rl. Using the definition in (24) for the two different approaches. 


and this ineguality is almost-surely strict. 

R2. IfAff) represents the null-space, thenAfifi!) C 7\/'(^), and this is 
almost-surely a strict subset when ^ is undersampled. Specifically, it almost- 
surely holds that dim(A/’(^)) = max{0,P — (d + l)N} while dim(A/’(^')) = 
max{0, P — N}. 

R3. If subscripts of matrices correspond to the columns, and ik denotes 
the order of the basis polynomial in the ith column in the kth dimension, then 
the associated inner product of columns is bounded by. 


sup < sup 




(1 




<sup|(^'i,^'j)l- (25) 

i¥3 


Remark 3.3. The theorem is presented for full gradient information to ease 
presentation, but the three points generalize to the case that derivative in¬ 
formation is included for a fraction of samples. Specifically, (1) holds with 
an appropriate adjustment to the dimensionality, (2) holds with an adjust¬ 
ment to the basis dependent multiplicative constant, and (3) holds ifhas 
derivative information for only a few samples. In summary adding deriva¬ 
tive information for even a percentage of the samples leads to bounds as in 
Theorem 3.5. 


3.3. A note on potentially contrasting solutions 

It is of practical importance to note what functions are recovered in an 
asymptotic sense by the gradient-enhanced and standard approaches, as they 
may differ significantly. The gradient-enhanced method gives u approximat¬ 
ing M in a Sobolev type loss function, 


L{u, u) 


\u 


u 


lll(S,iV) + 'Yh 


k=l 


d{u — u) 
dEk 


2 


12{S,N) 


(26) 
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where the £ 2 ( 2 , N) indicates the discrete £2 norm using N evaluations drawn 
from realizations of H, here ..., This norm is also normalized by 

N~^ so that as N goes to inhnity this norm tends to / 12 (H), the standard £2 
norm associated with the distribution of H. Specifically (15) guarantees that 
NL{u,u) < 6. In contrast, without derivative information using standard- 
normalization and producing a solution via (4), gives u approximating u such 
that — < 6, that is the partial derivatives of the approximation 

need not be approximated by those of the target function. As these loss 
functions differ, so too does the limiting solution produced by each method 
for a finite expansion order p. However, the sequence of approximations given 
by the two loss functions and by using increasing p (and accordingly N) will 
converge to u in the £ 2 ( 2 ) sense as p,N ^ 00 . 

4. Numerical Results 

In this section, we empirically demonstrate the gradient-enhanced £ 1 - 
minimization approach via three numerical examples: a manufactured PCE; 
an elliptic PDE with stochastic coefficient; and a plane Poiseuille flow with 
random boundaries. To compare the standard and gradient-enhanced £ 1 - 
minimization solutions, we define an equivalent sample size N, 

N-.= N, + uNg, (27) 

which accounts for the added cost of computing the derivative information. 
In (27), Nf. is the number of samples without derivative information, Ng is 
the number of samples with derivatives (along all d directions), and z/ is a 
positive parameter depending on the problem at hand and the approach em¬ 
ployed to compute the derivatives. For the example of Section 4.2, the cost 
of generating d derivatives of the Qol, obtained by the adjoint sensitivity 
method, is roughly the same as that of evaluating the Qol, thus implying 
that V = 2. For transient problems for which the cost of solving the adjoint 
equations for derivative calculations may be considerably more than that of 
a single Qol evaluation, then v > 2. Nevertheless, we here present all cost 
comparisons in terms of the number of equivalent sample size N in (27), 
for choices of v that we shall specify. For simulations based on standard 
£i-minimization, we set N = N^- Additionally, for the interest of conve¬ 
nience, we ignore the cost of solving the £i-minimization problems in (4) and 
(15). This is a valid assumption as often that cost is negligible relative to 
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the cost of evaluating the Qol or its derivatives. For terminology, if X% 
of samples used in the computation of the solution contain derivative infor¬ 
mation, then we say that method is X% gradient-enhanced. In this way, 
the standard approach without derivatives would give a solution that is 0% 
gradient-enhanced, denoted as standard in the subsequent hgures. Similarly, 
if all samples include derivative information, the associated solution would 
be 100% gradient-enhanced. 

4.I. Case I: A manufactured PCE 

First, we consider the reconstruction of a manufactured PCE, in which 
the sparsity and the entries of the coefficient vector c are a priori prescribed. 
Specihcally, we set the dimension of the expansion to d = 25 and use a 
p = 3 order PCE (hence P = 3276 basis functions) to manufacture the Qol 
m(S). To generate c, we hrst draw its P entries independently from the 
standard Gaussian distribution. We then retain \C\ G {50,150} coefficients 
with largest magnitude and set the rest to zero. This gives a randomized 
sparsity support. Finally, the realizations of m(S) and its derivative with 
respect to Ek, k = 1 ,... ,d, are generated by 


p 



(28) 


and 



(29) 


respectively. We then approximate the PCE coefficients c via (4) and (15) 
from these generated data. 

f.l.l. Results 

Beginning with the \C\ =50 case, we seek to recover the manufactured c. 
If the computed solution, denoted by c, has a relative root-mean-square-error 
(RRMSE) below 0.01%, then we call it a successful recovery of c. 

We hrst consider the case where evaluations of 'u(H) and its derivatives 
are exact, i.e., noise free. In Fig. 1, we compare the probability of successful 
recovery for gradient-enhanced and standard approaches, using 100 indepen¬ 
dent replications for each N. To set iV, we pretend the cost of evaluating d 
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derivatives of 'u(H) is the same as that of evaluating 'u(H), and therefore we 
set z/ = 2. 



(a) (b) 


Fig. 1: Probability of successful recovery of manufactures PCE with sparsity |C| = 50 via 
gradient-enhanced and standard £i-minimization. (a) 100% vs. 0% gradient-enhanced, 
(b) 20% vs. 0% gradient-enhanced. ( gradient-enhanced, standard) 

In Fig. la, we see that 100% gradient-enhanced £ 1 -minimization helps in 
reducing the computational effort to recover c, while Fig. lb, demonstrates a 
notable, but less, improvement for 20% gradient-enhanced £i-minimization. 
The hgure suggests that adding derivative information is a cost-effective 
means to increase the probability of successfully recovering c at low sam¬ 
ple sizes N. 

Fig. 2 shows similar results for sparsity \C\ = 150. Since \C\ is larger 
in this case, both standard and gradient-enhanced methods require more 
samples for recovery with a given success probability. This is consistent with 
the sampling rate presented in Corollary 1. We, however, note that the 
inclusion of derivative information still enhances the solution recovery. 
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(a) (b) 


Fig. 2: Probability of successful recovery of manufactures PCE with sparsity \C\ = 150 via 
gradient-enhanced and standard £i-minimization. (a) 100% vs. 0% gradient-enhanced, 
(b) 20% vs. 0% gradient-enhanced. ( gradient-enhanced, standard) 

In practice, there is often error (or noise) in the evaluation of 'u(H) and 
its derivatives, with the latter being more prone to errors. To model such 
inaccuracies, here we multiplying the realizations of 'u(H) and its derivatives 
from (28) and (29), respectively, by independent realizations of (1 + cat), 
where ca? is a zero mean Gaussian random variable with variance 10“®. In 
Fig. 3, we consider the sparsity \C\ = 50 case and compute the probability 
of successful solution recovery as a function of N. Similar as in the previous 
test cases, we use an RRMSE error of 0.01% to identify a successful recovery. 
We observe that the inclusion of derivative information, while imprecise, still 
improves the performance of the standard £i-minimization. 
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Fig. 3: Probability of successful recovery of gradient-enhanced and standard i\- 
minimization for the manufactured PCE case with sparsity \C\ = 50. (a) 100% gradient- 
enhanced. (b) 20% gradient-enhanced ( gradient-enhanced, -»■- gradient-enhanced 
with noisy ug only, gradient-enhanced with both noisy u and ug, - 0 - standard, 

standard with noisy u) 


4-2. Case II: Two-dimensional elliptic PDE with random coefficient 
We next consider the two-dimensional (in space) elliptic PDE 

— V ■ {a{x, H) VM(at, H)) = 1, x eV = [0, 1]^ , 
u{x, H) = 0, X e dV , 


where the diffusion coefficient a{x,S) is modeled by the lognormal random 


field 


a{x, H) = exp 


d 

k=l 


(31) 


Here, d = 30, and k = 1,..., d, are independent standard Gaussian 
random variables. In addition, a = 0.1, cXq = 0.5, and {(j)k}t=i the eigen¬ 
functions corresponding to the d largest eigenvalues {Xk}'l^i of the Gaussian 
covariance kernel 


Caa{Xi,X2) = exp 


(Xi - X2f 

II 


(di - y2? 

II 


(32) 


with correlation length Ic = 1/16. 

The Qol m(H) is chosen as the solution to (30) at location x = (0.5, 0.5), 
i.e., m(H) = u ((0.5, 0.5), H). The derivatives du/dEk are computed using the 
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adjoint sensitivity method explained in detail in Section 4.2.1. Both forward 
and adjoint solvers are implemented in the hnite element method (FEM) 
project FEniCS [39]. 


4 -2.1. Adjoint sensitivity derivatives 

The adjoint sensitivity methods are commonly used in research areas such 
as sensitivity analysis [40, 41], optimization [42, 43], shape design [44, 45], 
etc., to compute the derivatives of the solutions of interest with respect to 
the underlying model parameters. In this work, we adopt the discrete adjoint 
sensitivity method to compute derivatives of the Qol at the H samples. 

In detail, for the interest of convenience, we consider the discrete for¬ 
mulation of the generic PDE in (10), at a hxed time, given by the residual 
equation 

H) = 0, (33) 

where w G contains the discrete values of solution over the spatial 

domain P, and M is the number of solution degrees-of-freedom. Recalling 
that m( 3) (here a scalar functional of the solution) denotes the Qol, we seek 
to compute the sensitivity derivatives du/dE^ from 


du du du dw 

dEk dEk dw dEk' 


(34) 


Taking the derivative of (33) with respect to Ek, we have 


dn dn dw 

dEk ^ dw dEk 


(36) 


which results in dw/dE^ = —{dTZ/dw) ^ dlZ/dEj.. Plugging this in (34) 
gives 


which can be rewritten as 


du du du / d'JZ\ ^ dTZ 
dEk dEk dw \dw J dEk 


du 


d^k 


du - 7 . dlZ 
dEik dEik 


(36) 


where A is the solution to the discrete adjoint equation 

T / rk \ T 


dn 

dw 


A = - 
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du 

dw 


(37) 















For the case of elliptic PDF (30), dlZ/dw is the symmetric stiffness ma¬ 
trix of the FEM discretization and dlZ/d'E^ in (36) can be computed semi- 
analytically from (31) and the FEM formulation of (30). Here, we assume 
the inverse or a factorization of dlZ/dw is not stored when solving for w, 
and that the total cost of obtaining dlZ/dSk is smaller than that of comput¬ 
ing w. Therefore, the cost of solving for A from (37) is roughly the same as 
solving for w, which in turn suggests that u = 2 in (27). 

4-2.2. Results 

We approximate 'u(H) in a Hermite PCE with total degree p = 3, and seek 
to approximate the hrst 2500 coefficients. We sort the elements of {i^j} such 
that, for any given total order basis, the random variables with smaller 
indices k contribute hrst to the basis. 

To solve for the coefficients, we hrst generate the realizations u and uq 
using a 256 x 256 uniform, linear FEM mesh, which resolves both quanti¬ 
ties with low numerical errors. In Fig. 4, we compare the mean and stan¬ 
dard deviation of the RRMSE for solutions computed by the standard and 
gradient-enhanced £i-minimization, using 100 independent replications. The 
reference PCE coefficients are computed using least squares regression [16] 
with 10000 solution realizations and yields a relative error of 0.36% for 1000 
additional validation samples. From Fig. 4, we observe that higher accuracies 
are achieved by the gradient-enhanced f'l-minimization with the same number 
of samples N. In Figure 5, we show the magnitude of the approximate PCE 
coefficients computed via standard and gradient-enhanced £i-minimization 
with = 80 samples. More accurate coefficient estimates are obtained by 
the gradient-enhanced f'l-minimization. 
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(a) (b) 

Fig. 4: Comparison of the statistics of the RRMSE in reconstructing u ((0.5, 0.5), H), where 
the realizations of u and its derivatives are computed on a uniform 256 x 256 FEM mesh, 
(a) Mean of RRMSE. (b) Standard deviation of RRMSE. ( 100% gradient-enhanced, 

standard) 



Index of PCE coefficients (j) 

Fig. 5: Approximate PCE coefficients of u ((0.5,0.5), H) with iV = 80 vs. the reference 
coefficients obtained by least squares regression. (• reference, o standard £i-minimization, 
□ gradient-enhanced £i-minimization) 

To study how the accuracy of derivative information affects the accuracy 
of solution obtained by the gradient-enhanced f'l-minimization, we repeat 
this experiment on a coarser 16 x 16 mesh, where the derivative informa¬ 
tion is noticeably less accurate. We observe from Fig. 6 that, the accuracy 
improvement achieved by the gradient-enhanced £i-minimization is not as 
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considerable as in the case of 256 x 256 mesh. This snggests that high accn- 
racy on the derivative samples uq may be reqnired for the gradient-enhanced 
£i-minimization to be most effective. 



Fig. 6: Comparison of the statistics of the RRMSE in reconstructing u ((0.5,0.5), H) via 
gradient-enhanced and standard £i-minimization. (a) Mean of RRMSE. (b) Standard 
deviation of RRMSE. ( 100% gradient-enhanced, standard. Dashed and solid 

lines, respectively, correspond to the 16 x 16 and 256 x 256 mesh simulations.) 

4-3. Case III: Plane Poiseuille flow with random boundaries 

We consider a 2-D plane Poisenille flow with random bonndaries as de¬ 
picted in Fig. 7. 



Fig. 7: Schematic figure of plane Poiseuille flow with random boundaries. 


The average width of the channel is 2R = 0.2, and the width of the 
channel is model as 2 R{x) = 2R + 2r(a;, H), where 2 r{x,S) describes the 
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random fluctuation of the channel width around 2R. We use d = 20 inde¬ 
pendent standard Gaussian random variables fc = 1,..., d, to represent 
the uncertainty in R, and let 


r(a;, S) 


exp 


d 

fc=i 


(38) 


Here, f = —4, cr^ = 0.5, and and {0fc(3^)}fc=i are the d largest 

eigenvalues and corresponding eigenfunctions of the exponential covariance 
kernel 

Grr(a^i,a^ 2 ) = exp ^ , (39) 

where the correlation length G = 1/21. 

We seek to investigate the steady state velocity held of the flow, which is 
governed by the incompressible steady state Navier-Stokes equations 


iv ■ y)v — = —Vp, 

Re 

X G 'P(H), 

(40) 

V ■ V = 0, 

X G T’(H), 


dp 

dx 

X G T’(H), 


r; = 0, 

y = 



where the Reynolds number Re = 60, and p denotes pressure. Notice that in 
(40), we assume that all physical quantities are non-dimensional. The flow 
is driven by a pressure gradient G = —0.1. 
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Fig. 8: Velocity magnitude contours with two independent realizations of the random 
inputs. 


The Qol is t;a;(0.5,0), the horizontal velocity at (0.5,0), and we approxi¬ 
mate it in a Hermite PCE of total degree p = 3. 


4.3.1. Adjoint sensitivity derivatives 

To compute the derivative information, we again adopt the adjoint sen¬ 
sitivity method, in which we approximate &K./d'Ek, A; = 1,..., d, in (36) via 
hnite difference quotient 


dn _ 7^(m, 5 + A5^) - 7^(tn, 5) 
dEk e 

Here, the mth entry of the vector AH^ is defined as 

m = k, 
m ^ k. 



(41) 


(42) 


To solve the non-linear problem (40), we employ a standard Newton solver, 
in which dlZ/dw from (37) is the Jacobian matrix. In (41), the discrete 
representation of the velocity held, w, is kept unchanged; hence, (40) does not 
need to be solved again. The perturbed residual H-|-AH^) is computed 
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by deforming the mesh to conform to the geometry corresponding to H+AH^, 
without recomputing w. Compared to solving the adjoint equation (37), the 
cost of calculating (41) is negligible. Additionally, in our deterministic solver, 
computing w required on average 3 Newton steps. Therefore, the extra cost 
of computing derivative information is roughly equivalent to 1/3 of the cost 
of computing w, which in turn suggests setting u = 4/3 in (27). 

4-3.2. Results 

We consider 0%, 20% and 100% gradient-enhanced £i-minimization, with 
N G {20,40,80,160}. 




(a) (b) 


Fig. 9: Comparison of the statistics of the RRMSE in reconstructing Vx ((0.5,0),H) via 
standard and gradient-enhanced .^i-minimization, with 100 independent replications, (a) 
Mean of RRMSE. (b) Standard deviation of RRMSE. ( 100% gradient-enhanced, 

20% gradient-enhanced, standard) 


Fig. 9 displays the comparisons of the mean and standard deviation of the 
RRMSE, with 100 independent replications of computed PCE coefficients, 
showing that gradient-enhanced £i-minimization again leads to cost-effective 
accuracy improvement. 


5. Proofs 
5.1. Theorem 3-4 

We prove our results here in the case of (3q dehned as in (24), which due to 
the non-independent rows is a slight generalization of the results using (17). 
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Adjusting the proofs present here to account for the latter case requires only 
the substitution of /iq for /5 q, showing this Theorem. 

5.2. Lemma 3.1. 

We begin by providing a brief proof for Lemma 3.1, which follows directly 
from the explicit form for the Hermite derivative as in (5.5.10) of [33]. 

Proof. Note that linearity of expectation allows us to take the expectation 
inside the sum, so that we may work on each term independently. For 1- 
dimensional orthonormal Hermite polynomials, where i represents the order 
of the polynomial (5.5.10) of [33] shows that 

^(H) = (43) 

As the tensor product of orthonormal polynomials is orthonormal, (21) for 
i = j follows from the derivative being exactly in the direction of a Hermite 
polynomial. For i ^ jf, (21) follows because each term in the sum is the 
expectation of a product of orthogonal polynomials that differ in at least one 
coordinate, and hence the integral is equal to zero. ■ 


5.3. Theorem 3.3 

To prove Theorem 3.3, we appeal to a matrix variant of the Chernoff 
bound [46], which is similar to approaches taken in [36, 35, 47, 16]. 

Proof. Recall that Q represents a truncation for the domain of H, and may 
be given by (18). We have that 

/ = E G Q) P(Q) + E (X^X|| G Of) P(Q"). (44) 


A brief calculation gives that 


ec := ||E(X^X|^ G Q) 


< 


P(Q) 


E X^X 


^ e Q' 



(45) 

(46) 


bounds the bias introduced from not accepting samples within Q^. We note 
that with the truncation in (18), and using this bound, cq < 0.1/\/P [9], for 
all N considered here, and in most similar problems. Specihcally, an analytic 
issue does not arise until consideration of exponential levels of oversampling. 
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This analytic issue concerns with the truncated rare events being reliably 
observed due to the very large sample pooh 

Restating (45), let Aniin(-) and Amax(‘) correspond to the smallest and 
largest eigenvalues of the argument matrix, respectively. Then for any arbi¬ 
trary set of columns, denoted S, 

l-eQ<A,„,„(E(Jlcy:,5)X(:,5)||e S)) (47) 

< A„„ (E (X^O, S)X(:,S)li e Q)) < 1 + eg. (48) 

From (24) we have that if each sample G Q, then for all k, 

\\Xki:,S)\\l<s(3Q, (49) 

holds uniformly for all choices of S such that |iS| < s. This provides an 
upper bound on the singular values of our independent self-adjoint matrices, 
X^(:, S)Xk(:, S), uniformly over all choices of S consisting of at most s 
elements. We define 


N 




(60) 


k=l 


An application of the Chernoff bound as in Theorem 1.1 of [46] and Theorem 
1 of [36] gives that for d G [0,1] and |iS| < s. 


P Kin (Ms) < (1-<5)(1-6q) 


P I Amax {Ms) > (1 -|- 5)(1 -|- Cq) 


G Q Vfc < 


G Q Vfc < 


N(l-eQ) 

,-(5 \ sPq 


{1-6) 


l-<5 


(51) 


iV(l + eQ) 

\ 


(l + <5) 


l+<5 


(52) 


Note that 


1 

1 

n\ 

IV 

1 


-l- 5)(1 -|- eg) < 1 -|-1 — 

<5< 


t-eQ_ 
1 - Cq’ 

i - gp 

1 + €q 


(53) 

(54) 
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and so we have a critical 6, given by 


6t (t — eQ)/{l + cq), (55) 

is such that for all S < St the matrix Mg is guaranteed to satisfy HiW ^—/||2 < 
t. Note that for 0 < <5 < 1, 


.-<5 


> 


( 1 _ 5 ) 1-5 - (1 + 5 ) 1 + 5 ’ 

and so we may bound the sum of the probabilities by 


(56) 


P IIM 5 -/II <t 


e Q VA; I < 2s 


(l-5i)i-^‘ 


hPq 


(57) 


We now bound this probability. We note that if cq < t then 

0 < <t-eQ. (58) 

i + Cq 

To create a bound without explicit dependence on eg, we note that for 


Q := A - eg + (t + eg) log(t + eg); 


,-5t 


Thus for t G (0,1), 


P ||Mc-I|| <t 


e Q VA; < 2 s exp 


< 2 s exp ( —C, 


q( 1 - eg)iV 
s/3q 

tN \ 


'Q 


(59) 

(60) 


where Cq is a reasonably large positive constant for most truncations. Via 
a union bound over the (^) possibilities of subsets of P with cardinality s, 
it follows that 


P 


sup \ras.^{Ms - I) > t 

\S\<s 




(61) 



Recalling that 


sup Amax(-^hr5 I'j 6, 
\S\<s 


gives 


+ log(^2 sQ^^. (62) 

We assume that having any sample G leads to an arbitrarily large 
Amax, hence yielding the bound, 

+ log(^2sQ^^ . (63) 

Using the relation, ^(<5^ < t) = 1 — P((5s > t), gives that 

P(5. <t)> P(Q)^ - exp + log ( 2 "(^))) • (64) 

Using the relation that 


P(5s > t) < 1 - P(Q)'^ + exp 


P(5s >t)< exp I —C, 


'Q 


Nt 

sI3q 



< 





it follows that 


< 2se" ; (65) 

log ^25^”^^^ < log( 2 s) + s + slog(P/s), ( 66 ) 

which completes the proof. Remark 3.2 follows from taking s = P and using 
that (p) = 1 . ■ 

We note that Corollary 1 follows from Theorem 3.3, by substituting p* for 
P((5s < f); substituting 5* for f; and performing some algebraic manipulation. 



5.4- Theorem 3.5 

The proof of Theorem 3.5 relies on the properties of the measurement 
matrices ^ and T' themselves. In an intuitive sense, the results follow from 
the two matrices having similar properties, but the gradient-enhanced matrix 
having more rows, yielding better conditioned Gramians. 
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Proof. We begin by showing R1 for the one-dimensional case. Note that for 
arbitrary and i, 


1 -h i 


< max{|V'i(Or, 


and that equality can only hold if for some i, which is an 

event that occurs with probability zero. As this inequality holds for all i 
and f, being strict for almost all R1 follows for the one-dimensional case. 
The d-dimensional analogue follows as the d-dimensional polynomials are 
tensor-products of the one-dimensional polynomials. 

To show R2, note that up to an invertible pre-multiplication, is a 
submatrix of and thus C Additionally, we notice that 

and ^ are almost surely full rank matrices. 

We next show R3 in the case of 1-dimensional Hermite polynomials. Here 
subscripts of matrices refer to the column corresponding to that polynomial 
order. We have by the normalization (22) and (43) that 


It follows that. 






Applying a supremum. 


V(1 + i)(l +j) 


sup < sup 

y (1 -I- ?) (1 -I- j) 

< sup 

V(l + z)(l+j) 


(67) 


where we have used the inequality 

1 + \/(» - i)(j - 1 ) , < i + \/g _ jggj 

v/(l + (i-l))(l + (i-l)) - V(l + i)(l+j) 

This shows the result for the one-dimensional case. The d-dimensional case 
leads to a decomposition in (67) with inner products of lower order along each 
dimension, and the inequality in (68), is replaced by d similar inequalities.■ 
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6. Conclusion 

Within the context of compressive sampling of sparse polynomial chaos 
expansions, we investigated f'l-minimization when derivative information of 
a quantity of interest (Qol) is present. We provided analysis on gradient- 
enhanced £i-minimization for Hermite polynomial chaos, in which we showed 
that, for a given normalization, including derivative information will not re¬ 
duce the stability of the £i-minimization problem. Further, we identihed a 
coherence parameter that we used to bound the associated Restricted Isome¬ 
try Constant, a useful and well-studied measure of the stability for solutions 
recovered by £i-minimization as in the context used here. 

Furthermore, we observed improved solution accuracy from gradient- 
enhanced f'l-minimization in three numerical examples: Manufactured poly¬ 
nomials; an elliptic equation; and a plane Poiseuille flow with random bound¬ 
aries. Consistently, gradient-enhanced f'l-minimization was seen to improve 
the quality of solution recovery at the same computational cost, or equiva¬ 
lently achieve the same solution quality at a reduced computational cost. As 
the Qol derivatives are often more sensitive to discretization errors than the 
Qol itself, so too is the accuracy of the solution obtained by the gradient- 
enhanced £i-minimization. This was empirically observed in the second nu¬ 
merical example considered, thereby suggesting high accuracy requirements 
on derivative calculations for the gradient-enhanced £i-minimization to be 
most effective. 
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